Pivot Probability Induction for Statistical Machine Translation with Topic Similarity ⋆

نویسندگان

  • Yanzhou HUANG
  • Xiaodong SHI
  • Jinsong SU
  • Yidong CHEN
  • Guimin HUANG
چکیده

Previous works employ the pivot language approach to conduct statistical machine translation when encountering with limited amount of bilingual corpus. Conventional solutions based upon phrase-table combination overlook the semantic discrepancy between the source-pivot corpus and pivot-target corpus and consequently lead to probability estimation inaccuracy for the induced translation rules. In this paper, the latent topic structure of the document-level training data is learned automatically and each phrase translation rule is assigned to a topic distribution. Furthermore, the phrase probability induction is carried out on the basis of the topic similarity, allowing the translation system to consider the semantic relatedness among different rules. Using BLEU as a metric of translation accuracy, we find out our system is capable of achieving a absolute improvement in in-domain test compared with the baseline system.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Improving Pivot-Based Statistical Machine Translation by Pivoting the Co-occurrence Count of Phrase Pairs

To overcome the scarceness of bilingual corpora for some language pairs in machine translation, pivot-based SMT uses pivot language as a "bridge" to generate source-target translation from sourcepivot and pivot-target translation. One of the key issues is to estimate the probabilities for the generated phrase pairs. In this paper, we present a novel approach to calculate the translation probabi...

متن کامل

Statistical Machine Translation without a Source-side Parallel Corpus Using Word Lattice and Phrase Extension

Statistical machine translation (SMT) requires a parallel corpus between the source and target languages. Although a pivot-translation approach can be applied to a language pair that does not have a parallel corpus directly between them, it requires both source–pivot and pivot–target parallel corpora. We propose a novel approach to apply SMT to a resource-limited source language that has no par...

متن کامل

End-to-end statistical machine translation with zero or small parallel texts

We use bilingual lexicon induction techniques, which learn translations from monolingual texts in two languages, to build an end-to-end statistical machine translation (SMT) system without the use of any bilingual sentence-aligned parallel corpora. We present detailed analysis of the accuracy of bilingual lexicon induction, and show how a discriminative model can be used to combine various sign...

متن کامل

The Correlation of Machine Translation Evaluation Metrics with Human Judgement on Persian Language

Machine Translation Evaluation Metrics (MTEMs) are the central core of Machine Translation (MT) engines as they are developed based on frequent evaluation. Although MTEMs are widespread today, their validity and quality for many languages is still under question. The aim of this research study was to examine the validity and assess the quality of MTEMs from Lexical Similarity set on machine tra...

متن کامل

Evaluating a Pivot-Based Approach for Bilingual Lexicon Extraction

A pivot-based approach for bilingual lexicon extraction is based on the similarity of context vectors represented by words in a pivot language like English. In this paper, in order to show validity and usability of the pivot-based approach, we evaluate the approach in company with two different methods for estimating context vectors: one estimates them from two parallel corpora based on word as...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2013